home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Technotools
/
Technotools (Chestnut CD-ROM)(1993).ISO
/
lang_oth
/
libry31a
/
libry7.doc
< prev
next >
Wrap
Text File
|
1987-01-20
|
8KB
|
132 lines
.pa
VECTOR EMULATIONS
Vector emulations are software procedures that mimic the operation of
vector processing hardware. Of course, the software is not based on the
same principle as the hardware; but the concept is the same: specific
procedure designed to most efficiently perform similar repetitive tasks
on contiguously stored real numbers. No, I won't tell you how I do it,
so don't ask. My vector emulations are completely compatible with
Hewlett-Packard's Vector Instruction Set (VIS). They have the same
calling syntax and function (that's why I developed them in the first
place - downloading programs from an HP-1000F). HP has a very nice
manual with examples. If you are interested, perhaps they would sell
you one (I wouldn't even hazard a guess as to the cost).
Vector Instruction Set (VIS) User's Manual
Part No. 12824-90001
Hewlett-Packard Company
Data Systems Division
11000 Wolfe Road
Cupertino, CA 95014
You do not need a math coprocessor (Intel 8087/80287) in order to run a
program linked with LIBRY.LIB; but it makes a TREMENDOUS difference (a
factor of 120 or so for floating point operations). The vector
emulations will run even without a math coprocessor; but in that case
the speed is already so slow that nothing will help. The improvement in
speed with the vector emulations varies depending on the relative speed
of your processor and coprocessor. The greatest improvement is realized
on a PC with a 5MHz-8086/5MHz-8087 pair; and the least improvement is
realized on an AT with an 8MHz-80286/5MHz-80287 pair.
Note that the increments (INCR1,INCR2,INCR3),index (M), and the count
(N) are of the type INTEGER*2. Reals are of the type REAL*4 and double
precision reals are of the type REAL*8. There can be no mixing of
REAL*4 and REAL*8 types in the same emulation. To get double precision
use "CALL DVABS(...)" rather than "CALL VABS(...)".
It is very important to BE SURE THAT NO VECTOR CROSSES A SEGMENT
BOUNDARY (refer to Microsoft FORTRAN manual section 8). What this means
to the machine is that a vector must reside within a single segment
(65536 bytes) or it can not address all of the elements as a group. In
order to assure this to be the case, NEVER use the $LARGE metacommand.
If you have no COMMON then you never have to worry about this. If you
do have COMMON make sure that each COMMON contain no more than 65536
bytes. Of course, you can have several named COMMONs so this is not too
restrictive a limit on your programs. Also, if there is more than one
vector passed to the emulator they need not reside in the same segment.
For instance, you can add one real vector with 16384 elements to another
with 16384 elements and store the result in a third - as long as they
are all in different COMMONs. Of course, you can add two vectors in the
same COMMON provided their total number of elements does not exceed
16384. There is a way of getting around this; but it is too involved
to explain here.
A word of warning... vector emulations do not like being interrupted.
This is the whole point of "speed at any cost" procedures. For this
reason, the emulations may interfere with the operation of some "pop-up"
programs and such things as windowing and multi-tasking. This is
regretably unpredictable. I can say that the emulations don't interfere
with any of the "pop-up" programs that I have developed (like my DOS
command stack full-screen editor and improved scroller) that "lurk" in
the background; but I don't know about such programs that others have
developed.
.pa
SAMPLE FORTRAN EQUIVALENT OF A VECTOR ADD
SUBROUTINE VADD(V1,INCR1,V2,INCR2,V3,INCR3,N)
C
C VECTOR V3=V1+V2
C
IMPLICIT INTEGER*2 (I-N)
IMPLICIT REAL*4 (A-H,O-Z)
DIMENSION V1(N),V2(N),V3(N)
C
IF(N.LT.1) GO TO 999
I1=1
I2=1
I3=1
C
DO 100 I=1,N
V3(I3)=V1(I1)+V2(I2)
I1=I1+INCR1
I2=I2+INCR2
100 I3=I3+INCR3
C
999 RETURN
END
.pa
.ft c
.in 15
SUMMARY OF VECTOR INSTRUCTION SET
----------------------------------------------------------------------------------------------------------------------
SPEED IMPROVEMENT: PC WITH 8087 HP-1000F
CALLING SYNTAX OPERATION VECTOR LENGTH: N=10 N=100 N=10 N=100
----------------------------------------------------------------------------------------------------------------------
CALL VABS(V1,INCR1,V2,INCR2,N) (V2(I)=ABS(V1(I)),I=1,N) 4.0 7.5 4.5 5.1
CALL VADD(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.3 3.8 4.9 4.8
CALL VDIV(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)/V2(I),I=1,N) 2.5 3.2 5.0 5.7
CALL VDOT(S,V1,INCR1,V2,INCR2,N) S=SUM(V1(I)*V2(I),I=1,N) 4.0 4.8 3.5 3.6
CALL VMAB(M,V1,INCR1,N) V1(M)=AMAX1(ABS(V1(I)),I=1,N) 3.5 4.4 3.6 3.6
CALL VMAX(M,V1,INCR1,N) V1(M)=AMAX1(V1(I),I=1,N) 3.5 3.3 4.2 4.4
CALL VMIB(M,V1,INCR1,N) V1(M)=AMIN1(ABS(V1(I)),I=1,N) 3.8 4.8 3.7 3.2
CALL VMIN(M,V1,INCR1,N) V1(M)=AMIN1(V1(I),I=1,N) 3.5 3.5 4.2 3.9
CALL VMOV(V1,INCR1,V2,INCR2,N) (V2(I)=V1(I),I=1,N) 3.3 9.0 5.2 6.5
CALL VMPY(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)*V2(I),I=1,N) 3.5 3.8 4.8 4.7
CALL VNRM(S,V1,INCR1,N) S=SUM(ABS(V1(I)),I=1,N) 5.3 4.7 3.8 4.5
CALL VPIV(S,V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=S*V1(I)+V2(I),I=1,N) 3.4 3.5 4.6 5.2
CALL VSAD(S,V1,INCR1,V2,INCR2,N) (V2(I)=S+V1(I),I=1,N) 3.4 4.0 3.7 4.2
CALL VSDV(S,V1,INCR1,V2,INCR2,N) (V2(I)=S/V1(I),I=1,N) 3.0 3.2 4.5 4.4
CALL VSMY(S,V1,INCR1,V2,INCR2,N) (V2(I)=S*V1(I),I=1,N) 3.4 4.0 4.0 4.5
CALL VSSB(S,V1,INCR1,V2,INCR2,N) (V2(I)=S-V1(I),I=1,N) 3.4 5.3 3.6 4.1
CALL VSUB(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)-V2(I),I=1,N) 3.3 3.8 5.5 5.6
CALL VSUM(S,V1,INCR1,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.5 4.3 3.5 4.1
CALL VSWP(V1,INCR1,V2,INCR2,N) (V1(I)<->V2(I),I=1,N) 3.2 5.0 5.0 5.7
CALL VMIX(INDEX,V1,V2,N) (V2(I)=V1(INDEX(I)),I=1,N) 1.8 2.7 NA NA
CALL VMXI(INDEX,V1,V2,N) (V2(INDEX(I))=V1(I),I=1,N) 1.8 1.7 NA NA
CALL CLAMP(VMIN,VMAX,V,N) (V1(I)=AMAX1(VMIN,AMIN1(VMAX,V(I))),I=1,N) 8.0 9.0 NA NA
H=HORNER(C,X,N) H=SUM(C(I)*X**(I-1),I=1,N) 3.5 4.3 NA NA
----------------------------------------------------------------------------------------------------------------------
The above table shows, for instance, that an emulated add of two vector having length 100 is 7.5 times as fast
as the same operation in FORTRAN on a "stock" PC with an 8087 math coprocessor.
note 1: there is little or no improvement for n<10 and runtimes may increase for n<5.
note 2: for double precision add a "D" prefix (e.g. DVABS, DVADD, ..., DCLAMP, DHORNE).
note 3: vectors must not cross a segment boundary (see section 8 of Microsoft FORTRAN user's guide).
note 4: all integers (e.g. INCR1,INCR2,INCR3,n...) are of the INTEGER*2 type.
note 5: increments (viz. INCR1,INCR2,INCR3) can be positive, negative, or zero.
.ft e
.in 10